João Helis Bernardo
IFRN, UFRN, Brazil
Daniel Alencar da Costa
Queen's University, Canada
Uirá Kulesza
UFRN, Brazil
In this work, we analyze 162,653 Pull Requests (PRs) of 87 projects implemented in 5 different programming languages from GitHub to empirically investigate the impact of switching to Continuous Integration (CI) on the time-to-deliver of merged pull requests.
In particular, we address the following research questions:
RQ1: Are merged pull requests released more quickly using continuous integration?
RQ2: Does the increased development activity after adopting CI increase the delivery time of pull requests?
RQ3: What factors impact the delivery time after adopting continuous integration?
Full Paper Preview and Download
Here we present the results for each RQ that we address.
RQ1: Are merged pull requests released more quickly using continuous integration?
Interestingly, we find that the time to deliver PRs is shorter after the adoption of CI in only 51.3% of the projects. In addition, we find that in 62 (62 /87) of the studied projects, the merge time of PRs is increased after adopting CI.
RQ1: See Mann-Whitney-Wilcoxon test result and Cliff's delta values
RQ2: Does the increased development activity after adopting CI increase the delivery time of pull requests?
We find that there exists a considerable increase in the number of PR submissions and in the churn per releases after adopting CI. The increased PR submissions and churn are key reasons as to why projects deliver PRs more slowly after adopting CI. 71.3% of the projects increase the rate of PR submissions after adopting CI.
RQ3: What factors impact the delivery time after adopting continuous integration?
Our models indicates that, before the adoption of CI, the integration-load of the development team, i.e., the number of submitted PRs competing for being merged, is the most impactful metric on the delivery time of PRs before CI. On the other hand, our models reveal that after the adoption of CI, PRs that are recently merged in a release cycle are likely to have a slower delivery time.
RQ3: Explanatory Power of the models' variables (Before and After CI)
For replication purpose, we publicize our datasets to the interested reader.
The table below detail the meta-data that we fetch for a total of 162,653 pull requests of our dataset.
# | Variable | Definition |
---|---|---|
1 | project | Full name of the GitHub repository |
2 | language | Programming language used by the project |
3 | pull_id | Pull request ID |
4 | pull_number | Pull request number |
5 | commits_per_pr | Number of commits per PR. |
6 | changed_files | The number of files linked to a pull request submission |
7 | churn | The number of added lines plus the number of deleted lines to a pull request |
8 | comments | The number of comments of a pull request |
9 | comments_interval | The sum of the time intervals (days) between comments divided by the total number of comments of a pull request |
10 | merge_workload | The amount of PR that were created and still waiting to be merged by a core integrator at the moment at which a specific pull request is submitted |
11 | description_length | The number of characters in the body (description) of the PR |
12 | contributor_experience | The number of previously released pull requests that were submitted by the contributor of a particular PR. We consider the author of the pull request to be its contributor |
13 | queue_rank | The number that represents the moment when a pull request is merged compared to other merged PRs in the release cycle. For example, in a queue that contains 100 PRs, the first merged PR has position 1, while the last merged pull request has position 100 |
14 | contributor_integration | The average in days of the previously released PRs that were submitted by a particular contributor |
15 | stacktrace_attached | We verify whether the pull request report has an stack trace attached in its description |
16 | activities | An activity is an entry in the pull request' history |
17 | merge_time | Number of days between the submission and merge of a pull request |
18 | delivery_delay | Number of days between the merge and the delivery of a pull request |
19 | practice | We verify whether a pull request was submitted before or after the adoption of CI |
Use the link below to download all the pull requests of our dataset and their respective meta-data in a format of a CSV document.
We also publicize a second dataset that contains all the releases (7,495) and their respective meta-data for each of the 87 studied projects.
The table below detail all the meta-data that we fetch for the releases.
# | Variable | Definition |
---|---|---|
1 | project | Full name of the GitHub repository |
2 | language | Programming language used by the project |
3 | title | The release tag name |
4 | startedAt | Start date of a release |
5 | publishedAt | Publication date of a release |
6 | release_duration | The difference in days between the start and publication date of a release |
7 | created_pull_requests | The number of submitted pull requests per release |
8 | merged_pull_requests | The number of merged pull requests per release |
9 | released_pull_requests | The number of delivered pull requests per release |
10 | sum_submitted_pr_churn | The sum of submitted PR code churn per release |
11 | practice | We verify whether a release was published before or after the adoption of CI |
Use the link below to download all the releases of our dataset and their respective meta-data in a format of a CSV document.